Search CORE

78 research outputs found

PBNS: physically based neural simulation for unsupervised garment pose space deformation

Author: Bertiche Hugo
Escalera Guerrero Sergio
Madadi Meysam
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/11/2022
Field of study

We present a methodology to automatically obtain Pose Space Deformation (PSD) basis for rigged garments through deep learning. Classical approaches rely on Physically Based Simulations (PBS) to animate clothes. These are general solutions that, given a sufficiently fine-grained discretization of space and time, can achieve highly realistic results. However, they are computationally expensive and any scene modification prompts the need of re-simulation. Linear Blend Skinning (LBS) with PSD offers a lightweight alternative to PBS, though, it needs huge volumes of data to learn proper PSD. We propose using deep learning, formulated as an implicit PBS, to un-supervisedly learn realistic cloth Pose Space Deformations in a constrained scenario: dressed humans. Furthermore, we show it is possible to train these models in an amount of time comparable to a PBS of a few sequences. To the best of our knowledge, we are the first to propose a neural simulator for cloth. While deep-based approaches in the domain are becoming a trend, these are data-hungry models. Moreover, authors often propose complex formulations to better learn wrinkles from PBS data. Supervised learning leads to physically inconsistent predictions that require collision solving to be used. Also, dependency on PBS data limits the scalability of these solutions, while their formulation hinders its applicability and compatibility. By proposing an unsupervised methodology to learn PSD for LBS models (3D animation standard), we overcome both of these drawbacks. Results obtained show cloth-consistency in the animated garments and meaningful pose-dependant folds and wrinkles. Our solution is extremely efficient, handles multiple layers of cloth, allows unsupervised outfit resizing and can be easily applied to any custom 3D avatar

Diposit Digital de la Universitat de Barcelona

On the design of an ECOC-compliant genetic algorithm

Author: Baró Solé Xavier
Bautista Miguel Ángel
Escalera Guerrero Sergio
Pujol Vila Oriol
Publication venue: 'Elsevier BV'
Publication date: 04/06/2013
Field of study

Genetic Algorithms (GA) have been previously applied to Error-Correcting Output Codes (ECOC) in state-of-the-art works in order to find a suitable coding matrix. Nevertheless, none of the presented techniques directly take into account the properties of the ECOC matrix. As a result the considered search space is unnecessarily large. In this paper, a novel Genetic strategy to optimize the ECOC coding step is presented. This novel strategy redefines the usual crossover and mutation operators in order to take into account the theoretical properties of the ECOC framework. Thus, it reduces the search space and lets the algorithm to converge faster. In addition, a novel operator that is able to enlarge the code in a smart way is introduced. The novel methodology is tested on several UCI datasets and four challenging computer vision problems. Furthermore, the analysis of the results done in terms of performance, code length and number of Support Vectors shows that the optimization process is able to find very efficient codes, in terms of the trade-off between classification performance and the number of classifiers. Finally, classification performance per dichotomizer results shows that the novel proposal is able to obtain similar or even better results while defining a more compact number of dichotomies and SVs compared to state-of-the-art approaches

The Oberta in open access

Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine

Author: Escalera Guerrero Sergio
Kiani Kourosh
Rastgoo Razieh
Publication venue: 'MDPI AG'
Publication date: 24/04/2020
Field of study

In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for an enhanced recognition of unseen data. Two modalities, RGB and Depth, are considered in the model input in three forms: original image, cropped image, and noisy cropped image. Five crops of the input image are used and the hand of these cropped images are detected using Convolutional Neural Network (CNN). After that, three types of the detected hand images are generated for each modality and input to RBMs. The outputs of the RBMs for two modalities are fused in another RBM in order to recognize the output sign label of the input image. The proposed multi-modal model is trained on all and part of the American alphabet and digits of four publicly available datasets. We also evaluate the robustness of the proposal against noise. Experimental results show that the proposed multi-modal model, using crops and the RBM fusing methodology, achieves state-of-the-art results on Massey University Gesture Dataset 2012, American Sign Language (ASL). and Fingerspelling Dataset from the University of Surrey's Center for Vision, Speech and Signal Processing, NYU, and ASL Fingerspelling A datasets

Diposit Digital de la Universitat de Barcelona

Survey on 2D and 3D human pose recovery

Author: Angulo Bahón Cecilio
Escalera Guerrero Sergio
Perez Sala Xavier
Publication venue: 'IOS Press'
Publication date: 01/01/2012
Field of study

Human Pose Recovery approaches have been studied in the eld of Computer Vision for the last 40 years. Several approaches have been reported, and signi cant improvements have been obtained in both data representation and model design. However, the problem of Human Pose Recovery in uncontrolled environments is far from being solved. In this paper, we de ne a global taxonomy to group the model based methods and discuss their main advantages and drawbacks.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Segmentation of RGB-D indoor scenes by stacking random forests and conditional random fields

Author: Gonzàlez Jordi
Guerrero Sergio Escalera
Moeslund Thomas B.
Thøgersen Mikkel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

VBN

Articulated motion and deformable objects

Author: Escalera Guerrero Sergio
Kittler Josef
Perales Francisco J.
Wan Jun
Publication venue: 'Elsevier BV'
Publication date: 01/02/2018
Field of study

This guest editorial introduces the twenty two papers accepted for this Special Issue on Articulated Motion and Deformable Objects (AMDO). They are grouped into four main categories within the field of AMDO: human motion analysis (action/gesture), human pose estimation, deformable shape segmentation, and face analysis. For each of the four topics, a survey of the recent developments in the field is presented. The accepted papers are briefly introduced in the context of this survey. They contribute novel methods, algorithms with improved performance as measured on benchmarking datasets, as well as two new datasets for hand action detection and human posture analysis. The special issue should be of high relevance to the reader interested in AMDO recognition and promote future research directions in the field

University of Surrey

VBN

Diposit Digital de la Universitat de Barcelona

Surrey Research Insight

RGB-D Segmentation of Poultry Entrails

Author: Guerrero Sergio Escalera
Jørgensen Anders
Moeslund Thomas B.
Philipsen Mark Philip
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

VBN

Spatiotemporal Facial Super-Pixels for Pain Detection

Author: Guerrero Sergio Escalera
Lundtoft Dennis Holm
Moeslund Thomas B.
Nasrollahi Kamal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/07/2016
Field of study

VBN

Deep learning with self-supervision and uncertainty regularization to count fish in underwater images

Author: Cantor Mauricio
Clapés i Sintes Albert
Escalera Guerrero Sergio
Tarling Penny
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2022
Field of study

Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging, particularly when densely packed in noisy images. Attempting this manually is slow and expensive, while traditional computer vision methods are limited in their generalisability. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals. To this end, we employ deep learning, with a density-based regression approach, to count fish in low-resolution sonar images. We introduce a large dataset of sonar videos, deployed to record wild Lebranche mullet schools (Mugil liza), with a subset of 500 labelled images. We utilise abundant unlabelled data in a self-supervised task to improve the supervised counting task. For the first time in this context, by introducing uncertainty quantification, we improve model training and provide an accompanying measure of prediction uncertainty for more informed biological decision-making. Finally, we demonstrate the generalisability of our proposed counting framework through testing it on a recent benchmark dataset of high-resolution annotated underwater images from varying habitats (DeepFish). From experiments on both contrasting datasets, we demonstrate our network outperforms the few other deep learning models implemented for solving this task. By providing an open-source framework along with training data, our study puts forth an efficient deep learning template for crowd counting aquatic animals thereby contributing effective methods to assess natural populations from the ever-increasing visual data

PubMed Central

Diposit Digital de la Universitat de Barcelona

MPG.PuRe

Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification

Author: Guerrero Sergio Escalera
Haurum Joakim Bruslund
Madadi Meysam
Moeslund Thomas B.
Publication venue: 'Elsevier BV'
Publication date: 01/12/2022
Field of study

VBN